AD-A252  755 

■iniiiii 


NUSC  Technical  Report  7064 
15  October  1991 


Feature  Detection  for  Model 
Assessment  in  State  Estimation 


D.  J.  Ferkinhoff 

J.  G.  Baylog 

K.  F.  Gong 

Combat  Control  Systems  Department 
S.  C.  Nardone 

University  of  Massachusetts  Dartmouth 


$ 


DTIC 

ELECTE 
JUL141992 

A 


Naval  Underwater  Systems  Center 

Newport,  Rhode  Island  •  New  London,  Connecticut 


Approved  for  public  release;  distribution  le  unlimited. 

92  V  l.i 


92-18287 

liiiKiiiiiiiiiiiiiiiiiiiiiiniiiiiiUi 


003 


PREFACE 

This  work  was  conducted  under  the  Submarine  Contact 
Management  Project  of  the  Combat  Control  Block  Program,  with 
support  under  Contract  N66604-90-D 104-0003.  The  NUSC 
principal  investigator  is  K.F.  Gong  (Code  2211).  The  sponsoring 
activity  is  the  Chief  of  Naval  Research,  Office  of  Naval 
Technology,  program  manager  D.  C.  Houser  (0(rNR-232). 

The  technical  reviewer  for  this  report  was  V.  J.  Aidala 
(Code  2211). 


REVIEWED  AND  APPROVED;  15  OCTOBER  1991 


P.  A.  LaBrecque 

Head,  Combat  Control  Systems  Department 


REPORT  DOCUMENTATION  PAGE 


OMt  Ma.  OiTOUilU 


1.  A6ENCY  USE  ONLY  (UaM  UMk)  1 2.  REPORT  DATE 

I  15  October  1991 


4.  TITIE  AND  SURTITLE 

Feature  Detection  for  Model  Assessment 
in  State  Estimation 


«.  AUTNORfS 

D.  J.  Ferkinhoff  S.  C.  Nardone* 


3.  REPORT  TYPE  AND  OATES  COVERED 


J.  G.  Baylog 


K.  F.  Gong 


7.  PERFORMme  ORGANIZATION  NAME(S)  AND  AOORESS(ES) 

Naval  Underwater  Systems  Center 

Newport  Laboratory 

Newport,  Rhode  Island  02841-5047 


R.  PERFORMING  ORGANIZATION 
REPORT  NUMRER 

TR  7064 


R.  SPONSORING/ MONITORING  AGENCY  NAME<S)  AND  AOORESS(ES) 

Chief  of  Naval  Research 

Office  of  Naval  Technology  (OCNR-232) 

Arlington,  VA  22203 


10.  SPONSORING /MOMTORHIIG 
AGENCY  REPORT  NUMRER 


11.  SUPPUMENTARY  NOTES 

*S.  C.  Nardone  is  affiliated  with  the  University  of  Massachusetts  Dartmouth, 
North  Dartmouth,  MA  02747. 


12a.  OISTRIRUTION/AVAAASRJTY  STATEMENT 

Approved  for  public  release; 
distribution  is  unlimited. 


12R.  OISTRIRUTION  CODE 


13.  AiSTRACf  (MuimumZOOiMinN} 

The  presence  of  deterministic  features  in  a  residual  sequence  is  oftentimes 
an  indication  of  a  modeling  error  in  the  state  estimation  process.  Here,  three 
methods  of  detecting  and  extracting  the  features  of  "jump"  and/or  "drift"  in  a 
predicted  residual  sequence  are  developed.  Two  of  the  methods  are  traditional 
ones.  The  first  is  a  multiple  hypothesis,  generalized,  likelihood  ratio  test 
that  results  in  a  chi-squared  (XSQ)  test  statistic.  The  second  is  a  similar, 
but  computationally  more  efficient,  intuitively  derived  test  resembling  a 
modified  Neyman-Pearson  (MNP)  test.  The  third  method  is  a  nontradltional  one; 
it  uses  a  backpropagation  artificial  neural  network  trained  to  emulate  the  MNP 
test,  Monte  Carlo  experimental  results  show  that  the  XSQ  and  MNP  give  essen¬ 
tially  identical  results,  while  the  ANN  —  although  apparently  outperforming 
the  XSQ  in  feature  detection  —  does  so  at  the  expense  of  a  higher  feature  mis- 
classlfication  probability,  which  is  an  undesirable  effect.  Overall,  the  ANN 
is  judged  as  a  feasible  approach  to  feature  detection,  and  Improved  performance 
is  expected  with  better  network  training. 


14.  SURlEa  TERMS  IS.  NUMRER  OF  RAGES 

Underwater  Tracking  Artificial  Neural  Networks  53 

Target  Motion  Analysis  Mathematical  Models  mmtrt  t-nnm 

Statistical  Theory 


17.  SECURITY  CLASSIFICATION 
OF  REPORT 

UNCLASSIFIED 


NSN  7S404)1-280-SS00 


IS.  SECURITY  CLASSIFICATION 
OF  THIS  RAGE 

UNCLASSIFIED 


1R.  SECURITY  CLASSNKATION 
OF  ASSTRAa 

UNCLASSIFIED 


20.  LIMITATION  OF  ASSTRACT 


Standard  Form  298  (Rav.  2.89) 

rumiwa  a*  tw 

1M-10I 


TABLE  OF  CONTENTS 


Section  Page 

USTOFE.LUSTRATIONS .  n 

UST  OF  TABLES .  n 

1.  INTRODUCTION .  1 


2.  PROCESS  MODEL .  3 

3.  BINARY  HYPOTHESIS  TEST . 11 

3.1  Likelihood  Ratio  Test  (LRT) .  11 

3.2  Generalized  Likelihood  Ratio  Test  (GLRT) .  19 

3.3  Artificial  Neural  Network  (ANN)  Hypothesis  Test .  25 

3.4  Training  the  Artificial  Neural  Network .  27 

4.  MULTIPLE  HYPOTHESIS  TEST .  29 

4.1  Likelihood  Ratio  Test .  29 

4.2  Modified  Neyman-Pearson  Test .  37 

4.3  Artificial  Neural  Network .  38 

5.  EXPERIMENTAL  RESULTS  AND  DISCUSSION .  39 

6.  SUMMARY  AND  CONCLUSIONS .  47 

REFERENCES .  48 


Acccoion  For 

r 

NTIS  CRA&I  KJ 

one  tab 

.ed  L 

Justiticolion 

1 

By  _ _ 

Distribution/ 

Availobiiity  Cedes 

Dist 


.A 


Avail  Oil'll  or 
Special 


i 


LIST  OF  ILLUSTRATIONS 


Figure  Page 

1  State  Estimation  Process .  1 

2  Binary  LRT .  16 

3  LRT  Receiver  Operating  Characteristic .  16 

4  Binary  GLRT .  23 

5  GLRT  Receiver  Operating  Characteristic .  24 

6  Typical  Single  Neuron .  25 

7  Typical  Feedforward  Network  witii  Two  Hidden  Layers .  26 

8  Illustration  of  the  Sigmoid  Function .  28 

9  Illustration  of  Network  Training .  28 

10  ANN  Receiver  Operating  Characteristic  (Jump  Ctaly) .  41 

11  Theoretical  GLRT  Receiver  Operating  Characteristic  (Jump  Only) .  41 

12a  XSQ  Receiver  Operating  Characteristic  (Jump) .  42 

12b  ANN  Receiver  Curating  Characteristic  (Jump)..  ^ .  42 

13a  XSQ  Jump  Ou4)ut  Given  Drift  Only .  43 

13b  XSQ  Jump  Ou^ut  Given  Jump  Only .  43 

13c  XSQ  Jump  Output  Given  Drift  with  Same  Polarity .  44 

13d  XS(3  Jump  Ou^ut  Given  Drift  with  Opposite  Polity .  44 

14a  ANN  Jump  Ou^ut  Given  Drift  Only .  45 

14b  ANN  Jump  Output  Given  Jump  Only .  45 

14c  ANN  Jump  Output  Given  Drift  with  Same  Polarity . 46 

14d  ANN  Jump  Output  Given  Drift  witii  Opposite  Polity .  46 


LIST  OF  TABLES 

Table  Page 

1  Multiple  Hypothesis  Test .  37 


FEATURE  DETECTION  FOR  MODEL  ASSESSMENT 
IN  STATE  ESTIMATION 


1.  INTRODUCTION 


This  report  addresses  the  problem  of  feature  detection  for  the  purpose  of  assessing  the 
validity  of  mottels  used  in  state  estimation.  The  state  estimation  process  is  depicted  schematically 
in  figure  1.  Here,  the  residuals  —  the  difference  between  the  observed  measurements  and 
estimates  of  the  measurements  (based  on  the  state  estimate)  —  are  mapped  into  a  correction  term 
for  the  state  estimate  tiirough  a  variant  of  the  gradient  equations  relating  the  measurements  to  the 
state.  The  estimation  algorithm  converges  to  an  estimate  that  provides  a  "best  fit"  to  tire  observed 
measurements.  When  the  system  model  sufficiently  reflects  tiie  actual  process  by  which  the 
measurements  were  produced,  the  residuals  are  noise-like  in  character.  That  is,  they  are  devoid  of 
any  deterministic  features. 


MEASUREMENT 


Figure  1.  State  EsUmation  Process 


Oflentimes,  indications  of  errors  in  tire  tystem  model  manliest  tiiemselves  as  deterministic 
features  in  the  residuals.  Ctmsequently,  the  {xocessing  of  measurement  data  for  state  estinuttion  in 
an  uncertain  modeling  oivitonment  requires  ctmtirmous  scrutiny  of  measurements  and 
measurement  residuals.  Once  a  feature,  or  set  of  features,  has  been  detected,  appropriate 
techniques  can  be  implemented  to  compensate  or  idoitify  and  correct  for  the  modeling  error.  One 
such  system,  TARSIA,  has  been  considered  previously  for  model  assessment  (reference  1). 


1 


In  TARSIA,  the  detected  residual  features  are  interpreted  in  an  evidential  reasoning  system  to 
identify  the  possible  modeling  anomaly.  The  work  presented  in  this  report  extends  the  feature 
detection  capability  of  TARSIA. 

Specifically,  this  report  details  three  methods  to  detect  and  extract  the  features  of  jump 
and/or  drift  in  a  predicted  residual  sequence.  Two  of  the  methods,  a  generalized  likelihood  chi- 
squared  (XSQ)  test  and  a  modified  Neyman-Pearson  (MNP)  test,  are  classical  numerical 
approaches.  The  third  method  uses  an  artificial  neural  network  (ANN)  and  is  a  nontraditional 
approach.  The  likelihood  ratio  test  (LRT)  and  generalized  likelihood  ratio  test  (GLRT)  for  the 
binary  hypothesis  test  are  developed  to  provide  a  background  for  future  derivations.  Additionally, 
a  backpropagation  ANN  is  trained  to  emulate  the  LRT  for  the  binary  hypothesis  test  Building  on 
the  binary  test  leads  to  development  of  the  more  complicated  multiple  hypothesis  GLRT,  which  is 
shown  to  result  in  a  XSQ  statistic.  An  intuitive  approach,  based  on  the  XSQ  test,  results  in  an 
MNP  test  that  is  more  computationally  efficient  than  theXSQ  test  Because  of  this  computational 
efficiency,  the  multiple  hypothesis  ANN  is  trained  to  emulate  the  MNP  test 

Experimental  results,  obtained  fixrm  Monte  Carlo  simulations  using  synthetic  data,  are 
presented  in  the  form  of  receiver  operating  characteristic  (ROC)  curves.  For  the  binary  hypothesis 
test  the  ANN  and  GLRT  give  nearly  identical  results  and  are  in  good  agreement  with  the 
theoretically  predicted  performance.  For  the  multiple  hypothesis  test  theXSQ  test  and  die  MNP 
test  give  essentially  identical  results,  while  the  ANN  performance  departs  somewhat  from  that  of 
the  conventional  tests.  Although  the  ANN  ouqrerforms  theXSQ  test  in  the  probability  of  feature 
detection,  it  does  so  at  the  expense  of  a  higher  feature  misclassification  (i.e.,  detecting  feature  A 
when  feature  B  is  present),  which  is  undesirable.  Overall,  the  performance  of  the  ANN 
demonstrates  the  feasibility  of  the  technique  but  also  indicates  that  the  training  of  the  network  may 
have  been  incomplete. 

The  remainder  of  this  report  is  organized  into  five  sections.  Section  2  defines  the  process 
and  mathematical  models  under  consideration.  Section  3  presents  the  binary  hypothesis  test  as 
background  and  discusses  the  ANN  implemented  for  that  problem.  Section  4  extends  the  binary 
hypothesis  test  to  multiple  hypotheses,  develops  theXSQ  test,  presents  the  MNP  test,  and 
discusses  the  multiple-feature  ANN.  Section  5  presents  the  experimental  results  and  discusses 
their  implications.  Section  6  offers  a  summary  and  conclusions. 


2 


2.  PROCESS  MODEL 


Consider  the  following  discrete  time  dynamic  state  model  that  generates  measurements  z(k): 

x(k  +  1)  =  <>x(k)  +  Bu(k)  +  rw(k),  (la) 

z(k)  =  H[x(k)]  +  n(k).  (lb) 

Here,  x(k)  is  the  Kxl  dimensional  state  vector;  u  is  the  input  vector;  w  is  zero-mean,  white, 
Gaussian  process  noise;  z  is  the  measurement;  and  n  is  zero-mean,  white,  Gatissian  measurement 
noise  that  is  independent  of  the  process  noise.  The  matrices  4>,  B,  F  are,  respectively,  the  state 
transition  matrix,  the  input  matrix,  and  the  process  noise  input  matrix.  The  function  H  is  the 
nonlinear  fimction  relating  the  state  to  the  measurement,  and  k  is  the  time  index.  The  state  x(k)  is 
available  only  as  an  estimate  x(klk),  along  with  the  estimated  error  covariance  matrix  P(kik).  The 
estimate  x(ktk)  of  the  state  at  time  k  is  based  on  all  the  past  data  up  to  and  including  time  k.  The 
estimate  is  available  from  a  suitable  estimator  like  an  extended  Kalman  filter  or  maximum 
likelihood  estimator. 

The  residual  zQdk  - 1)  is  the  difference  between  the  measurement  zQc)  and  the  estimate  of 
the  measurement  z(klk  - 1);  that  is. 


where 


z(klk  - 1)  =  z(k)  -  z(klk  - 1), 


z(kIk-l)=H[x(klk- 1)], 


x(klk  - 1)  =  <I>  x(k  -  Ilk  - 1)  +  Bu(k  -  1), 


where  x(klk  - 1)  is  die  state  estimate  at  time  k  based  on  the  measurements  up  to  and  including  time 
k  - 1  but  not  time  k.  The  process  to  be  investigated  here  is  the  predicted  residual  sequence 


where 


z  =  [  Z2,  Z2,  Zn]  , 


zi=  z(k  +  ilk). 


z(k  +  ilk)  =  H[x(k  +  ilk)], 


x(k  +  ilk)  =  <I>i  x(klk)  +  X  Bu(k  +  j  -  1).  (3d) 

j=l 


where  x(k  +  ilk)  is  produced  by  propagating  the  state  estimate  at  time  k  forward  in  time  witiiout 
updating  the  estimate. 

Examining  the  i^  element  of  the  predicted  residual  sequence 

z(k  +  ilk)  =  H[  x(k  +  i)]  -  H[  x(k  +  ilk)]  +  n(k  +  i),  (4) 

and  expanding  the  measurement  function  in  a  Taylor  series  about  the  estimate,  with 

5x(k  +  i)=  x(k  +  ilk)  -  x(k  +  i),  (5) 

yields  the  approximate  expression 

2(k  +  ilk)  =  -  aT[  x(k  +  ilk)]Sx(k  +  i)  +  n(k  +  i),  (6a) 


where 


a[x(k  +  ilk)]  = 


6H[x(k  +  i)] 
8x(k  +  i) 


x(k  +  i)  =  x(k  +  ilk) 


(6b) 


The  dynamic  equation  for  the  propagation  of  die  state  estimation  error  in  terms  of  the  fixed  error  at 
time  k  is 

8x(k  +  i)  =  ^8x(k)  -  gT(k  +  i)W,  (7a) 

where 

gT(k  +  i)  =  ....  <i>r,r,o, oi.  (7b) 

and 

WT  =  [w(k),  w(k  +  1), w(k  +  i  -  2),  w(k  +  i  -  1),  w(k  +  i), w(K  -  1)].  (7c) 

The  entire  predicted  residual  sequmce  can  be  approximated  as 

Z  =  -A[  x(klk)]8x(k)  -  GW  +  Nj^,  (8a) 


4 


where 


—  — 

A[x(klk)]  = 

|9H[x(k  +  i)]>j’^ 

t  ax(l!:)  J 

s 

aT|x(k  +  ilk)]Oi 

- 

x(k)  =  x(klk) 

-  - 

and 


Nk  = 

n(k  +  i) 

,  G  = 

gT(k  +  i) 

• 

« 

♦ 

(8b) 


(8c) 


The  predicted  residual  sequence  can  be  approximated  as  a  jointly  Gaussian  random  vector. 
For  unbiased  state  estimates,  Z  is  zero  mean  and  its  probability  density  function  is 


-1 


r  K/2  m-\  \  \ 

P(  Z)  =  [(2jc)  det  IVI  J  exp  [-  2  Z  V  Zj  . 

Here,  V  is  die  covariance  matrix  of  the  predicted  residual  vector  defined  as 


(9a) 


cov(Z)  =  E[^-E[gE[ZT], 

(9b) 

cov  ( ^  =  E[(Nk  -  A5x  -  GW)(Nk  -  A5x  - 

(9c) 

Thus, 

V  =  AP(klk)AT  +  $2  +  GCJGT 

(9d) 

where 

s2  =  E[NkNkT1, 

(9e) 

5 


P(klk)  =  E(5x(k)8xT(k)], 

(9f) 

Q  =  E[WWT]. 

(9g) 

Note  that  die  presence  of  process  noise  in  the  predicted  residual  sequence  results  in  an  effective 
noise 


Ne  =  NK-GW,  (10a) 

with  an  increased  covariance 

Se  =  S2  +  GCJGT  (10b) 


Further,  it  is  noted  diat  the  probability  density  function  for  the  predicted  residuals  is  only 
approximately  Gaussian.  However,  the  approximation  remains  valid  whenever  the  second-order 
and  higher  terms  of  the  Taylor  series  expansion  of  the  measurement  nonlinearity  can  be  ignored. 
Thus,  two  assumptions  on  the  predicted  residual  sequence  are  invoked  to  ensure  the 
reasonableness  of  the  Taylor  series  approximation.  Brst,  the  dynamic  model  of  equation  (1), 
which  is  used  to  obtain  the  estimate,  must  be  a  reasonably  accurate  description  of  the  real  process. 
Second,  the  state  error,  and  its  effect  over  the  time  interval  of  interest,  must  remain  sufficiently 
small  to  allow  the  validity  of  a  first-order  Taylor  series  approximation  of  the  predicted  residual 
sequence.  These  two  assumptions  are  the  basis  of  the  mill  hypothesis,  i.e., 

Hq:  No  modeling  error  and  a  "good"  state  estimate. 

Under  the  Hg  hypothesis,  the  predicted  residual  process  is  a  zero-mean,  jointly  Gaussian 
randoni  vector;  that  is,  the  probability  density  function  is 

r  K/2  1/21^  r  1  TyT„-m 

P(ZlHo)=  [(2n)  detIVI  J  exp  [-  2  2  ^  ^\,  (11) 

where  V  is  the  covariance  matrix  given  by  equation  (9d).  Note  diat  although  elements  of  die 

predicted  residual  vector  are  correlated,  there  always  exists  a  set  of  basis  vectors  in  which  Z  can  be 
represented  by  uncorrelated  (and  hence  independent)  components.  For  many  ^plications 
considered  here,  both  process  noise  and  state  estimation  error  covariances  are  small  compared  widi 

the  measurement  noise  covariance.  For  these  cases,  the  elements  of  Z  are  also  independent 

Under  the  alternate  hypothesis  Hj,  there  is  a  modeling  error.  The  original  state  and 
measurement  models  of  equations  (1)  do  not  adequately  describe  the  system  dynamics.  Modeling 
errors  may  arise  from  unknown  system  inputs,  a  change  in  the  measurement  model,  or  an 
unobserved  change  in  tiie  measurement  noise.  Examples  of  such  modeling  anomalies  are, 
respectively,  a  target  maneuver,  change  in  propagation  path,  and  measurement  bias.  Regardless  of 


6 


the  cause  of  the  modeling  error,  its  observable  effect  is  often  a  feature  in  the  predicted  residual 
sequence. 

Under  the  mismodeling  hypothesis  Hj,  the  predicted  residuals  exhibit  features  that  are 
distinguishable  from  those  of  hypothesis  Hq.  For  hypothesis  H|,  a  (q-1)*  order  polynomial  is 
used  to  model  the  anomalous  features  of  the  predicted  residuals,  that  is. 

Hi:  Z(k  +  ilk)  =  -aT[x(k  +  iD:)]5x(k)  +  m(k  +  i)  +  n(k  +  i)-gT(k  +  i)W,  (12a) 


where 

m(k  +  i)  =  [ao  +  -  to)  +  a2(tk+i  -  to)^  + . . . 

+  aq-i  (tk+i  ■  to)  -  to).  (12b) 

"(0=  {?  M\} . 

and  to  is  the  time  of  the  modeling  anomaly.  Hie  predicted  residual  sequence  under  Hi  is 


Hi;  Z  =  -A[  x(klk)]8  x(k)  -  GW  +  Mi  +  Nr, 

where 

Ml  =B  m. 


I-  0 


B  = 


1  0 

1  (tl  -  to)  (ti  -  to)2 


and 


i-aq.i-J 


(13a) 

(13b) 


7 


Tl^  term  contributes  to  the  mean  value  of  Z;  hence, 

E[Z]  =  Mi,  (14a) 

and 

covE[g=E[(Z-Mi)(Z-Mi)Tl  =  V.  (14b) 

Thus,  the  covariance  under  both  assumptions  is  the  same  and  only  the  means  differ.  The 
probability  density  function  (PDF)  for  Ae  jointly  Gaussian  random  vector  under  Hi  is 

r  K/2  wr‘  r ,  . 

P(ZIHi)=  1_(2*)  detlVI  J  exp  l^(Z-Mi)  V 

Similar  results  can  be  developed  for  both  the  smoothed  and  filtered  residuals.  For  the 
smoodied  residual  process,  the  state  estimation  error  is  correlated  with  die  past  measurement  noise. 
The  conditional  PDF  or  likelihood  function  for  the  observations  (no  process  noise) 

Z=[zi . Zk]T  (16) 

given  the  state  x,  is 

P(Zlx)  N[Z(x).s2],  (17a) 

or 

P(Zlx)  =  [(2jc)*^^det  IS2|^'^]  exp  [-  |  [Z  -  Z(x)]^S-2[Z  -  Z(x)]j  ,  (17b) 

where 

Z(x)  =  [H[x(1)] . H[x(K)]]  .  (17c) 


8 


The  maximum  likelihood  estimate  is  the  value  of  the  state  x  that  maximizes  the  likelihood  function 
of  equation  (17).  Taking  the  logarithm  of  the  likelihood  function  yields  the  log-likelihood 
equation: 


ln[P(Zlx)]  =  [constant]  +  5  [Z  -  Z(x)]  ^  S-2  [Z  -  Z(x)] .  (18) 

Differentiating  the  log-likelihood  equation  with  respect  to  the  state  at  time  k  and  setting  the  result  to 
zero,  one  can  obtain  the  maximum  likelihood  estimate  by  solving  tiie  equation 

AT[x(k)]S-2[Z-ZIx(k)l]|^(y  =  i(ldk)  =  0-  (1») 


For  the  linearized  smoothed  residuals, 

Zs  =  -A[x(klk)]8x(k)  +  Nr,  (20) 

where  A[  ]  is  defined  in  equation  (8b).  The  error  in  tiie  estimate  Sx(k)  is  ^proximately 

8x(k)  «=  [ATS-2A]  '^ATS-^Nk,  (21a) 

and  the  state  error  covariance  is 

P(klk)  =  E{[5x(k)5xT(k)]},  (21b) 

which  can  be  tqtproximated  as 

P(klk)  -  [ATS-2A]  (21c) 

Consequently,  from  equation  (9),  the  PDF  for  the  smoothed  residuals  under  Hq  is 

P(ZIHo)=»N(0,Vs),  (22a) 


where 

Vs  =  cov(Zs), 


or 

Vs  =  E[(Nk  -  A8x)(Nk  -  A5x)T]; 


substituting  from  equation  (21a)  for  8x(k)  yields 


Vs  =  s2-ATp(klk)A. 


(22b) 


Under  hypothesis  H  j  (i.e.,  a  modeling  anomaly  exists),  the  PDF  of  the  predicted  residuals 
is  (possibly)  nonzero-mean  jointly  Gaussian: 

P(  ZlHi)  =>  N(Mi,V),  (23) 

where  Mj  is  the  mean  vector  defined  in  equation  (13),  V  is  the  covariance  matrix  given  by  equation 
(9d),  and  N(Mi,V)  is  the  Gaussian  PDF  with  mean  Mj  and  covariance  V.  Similarly,  under 
hypothesis  Hi  for  the  smoothed  residuals. 


P(ZslHi)  =>N(Mi,Vs).  (24) 

When  the  filtered  residual  sequence  is  obtained  from  the  extended  Kalman  filter,  the 
estimate  x(klk)  is  the  conditional  mean,  and  the  itmovations  process  is  u^iite  (reference  2).  For  this 
process,  the  PDFs  are 


where 


Hq:  P(ZflHo)=>N(0,Vf), 

(25a) 

Hi:  P(ZflHi)=*N(Mi,Vf), 

(25b) 

Vf=diag[aTp(ili)a  +  On2]. 

(25c) 

The  above  development  characterizes  duee  processes:  predicted  residuals,  smoothed 
residuals,  and  filtered  (conditional  mean)  residuals.  The  predicted  residuals  are  obtained  from 
future  measurements  and  the  state  prediction;  the  smoothed  residuals  are  obtained  by  fitting  the 
past  data  witii  die  best  state  estimate;  die  filtered  residuals,  usually  obtained  from  a  Kalman  filter, 
involve  die  current  state  estimate  and  the  next  measurement  While  the  filtered  innovations 
obtained  from  a  Kalman  filter  are  a  white  process,  both  die  smoothed  and  predicted  residuals  are 
correlated  to  some  degree;  however,  for  small  state  estimation  error,  they  can  often  be 
approximated  as  a  white  process.  In  this  investigation,  attention  is  focused  on  the  predicted 
residuals,  and  for  some  results  is  limited  to  the  case  of  negligible  state  estimation  error. 


10 


3.  BINARY  HYPOTHESIS  TEST 


The  optimimi  hypothesis  test,  for  either  Bayes  or  Neyman-Pearson  criteria,  is  the 
likelihood  ratio  test  In  this  section  the  two  binary  hypothesis  tests  that  decide  if  a  modeling 
anomaly  is  present  are  presented.  They  are  the  likelihood  ratio  test  (LRT)  and  generalized 
likelihood  ratio  test  (GLRT).  The  LRT  is  applicable  to  detection  of  known  signals,  while  the 
GLRT  is  appropriate  for  the  detection  of  a  known  signal  form  having  unknown  values.  Simple 
examples  of  the  detection  of  signals  in  noise  are  given  along  with  experimental  results.  The 
design,  implementation,  and  testing  of  an  artificial  neural  networic  (ANN)  for  the  binary  hypothesis 
test  are  also  presented.  The  development  of  the  binary  hypothesis  test  of  this  section  is  a  precursor 
to  the  multiple  hypothesis  test  of  the  next  section. 


3.1  LIKELIHOOD  RATIO  TEST  (LRT) 

The  (binary)  LRT  compares  the  ratio  of  likelihood  functions  fix>m  two  hypotiieses  to  a 
threshold  and  selects  one  of  the  two  hypotheses  as  being  true.  The  predicted  residual  probability 
density  functions  (PDFs)  from  section  2  are  used  as  the  likelihood  functions.  The  two  hypotheses 
are 


Hq:  no  modeling  anomaly  and  a  "good"  state  estimate  (Mi  =  0), 
Hi:  presence  of  a  modeling  anomaly  ( Mi  ^  0). 

Under  tiiese  two  hypotheses,  the  likelihood  ratio 

Ho 

P(ZIHi)  <  ^ 

P(ZIHo)  Hi 


(26) 


is  formed.  P  (^Hq)  and  P  (3Hi)  were  defined  in  equations  (1 1)  and  (15),  respectively.  The 

constant  A,  is  the  threshold.  Four  outcomes  from  this  test  are  possible:  two  correct  decisions  of 
selecting  Hq  when  Hq  occurred  or  Hi  when  Hi  occurred,  and  two  incorrect  decisions  of  selecting 
Hi  when  Hq  occurred  (false  detection  or  false  alarm)  or  Hq  when  Hi  occurred  (missed-detection). 

Note  that  since  P  (ZIHq)  and  P  (ZlHi)  are  both  positive,  it  follows  that  X.  >  0 .  Further,  if  Hi  is 
more  likely  than  Hq,  the  ratio  on  the  left-hand  side  of  the  inequality  is  greater  than  one;  and, 
conversely,  if  Hq  is  more  likely  than  Hi,  the  ratio  is  less  than  one. 


11 


Typically,  X  is  set  to  achieve  a  specified  probability  of  false  alarm,  firom  which  a  detecticm 
probability  follows.  Thus,  the  development  here  focuses  on  detennining  false  alarm  and  detection 
probabilities. 

Substituting  for  P  (ZIHq)  and  P  (ZIHj)  from  the  previous  section  and  reducing  the 

ratio  yields 

Ho 

r  1  ~  2  1  „  2  1  <  , 

exp|^-|llZ-Mill^.l  +  5llZII^-lJ  >  (27) 

Here,  Z  is  the  predicted  residual  sequence,  and  M  j,  the  modeling  anomaly,  is  the  signal  to  be 
detected.  Expanding  the  weighted-norms  of  the  exponent  and  reducing  yields 

Ho 

exp  [- -2Myv‘%]  J  1.  (28) 

Hi 

Because  the  logarithm  is  a  monotonic  fimction,  the  logaiidim  of  the  likelihood  ratio  can  be 
used  widiout  altering  the  outcome.  Taking  the  natural  logaiidun  of  equation  (28)  yields 

Ho 

mJv'^Z  -  J  InX.  (29) 

Hi 

If  die  signal  Mi  is  known,  then  the  second  term  on  die  left  of  inequality  (29)  can  be  moved  to  the 
right-hand  side,  resulting  in  the  LRT: 

Ho 

mJv’^Z  J  lnX  +  |Mjv‘^Mi.  (30) 

Hi 


The  left-hand  side  of  inequality  (30)  is  the  LRT  test  statistic.  Because  it  is  a  linear  combination  of 
Gaussian  random  variables,  it  is  also  a  Gaussian  random  variable.  Furdier,  because  the  LRT 
depends  solely  on  the  test  statistic. 


12 


(31) 


then,  /(Z)  is  a  sufficient  statistic  for  LRT.  Thus,  knowledge  of  the  probability  density  function 

for  /(^  is  sufficient  for  determining  the  probability  of  false  alarm  and  the  probability  of  detection. 
Under  Ae  hypoAesis  Hq:  Mj  =  0, 

E  [/(ZlHo)]  =  E  [M]^V‘^[A8x(k)  +  Nr]]  =  0.  (32) 

The  variance  of  /(ZIHo)is 


var  [/(ZIHq)]  =  E[{Myv‘^[A5x(k)  +  Nr]}  (M^V  ^[A5x(k)  +  NkD^]  ,  (33a) 


which  reduces  to 

var  [/(ZlHo)]  = 

Define  the  variance  of  /[(ZIHq)]  as 

2  T  -1 
a,  =  M{v  Ml. 

Similarly,  for  Ae  alternate  hypoAesis  Hi :  Mi  ^  0, 

E[  /(^i)]  =  E[Mj^V'^[A8x(k)  +  %  +  Mi]]  , 
which  reduces  to 

ei/(^i)]=mJv'^mi, 

or 

E[/(ZIHi)]=  oj. 


(33b) 


(34) 


(35) 


13 


~  2 

TTie  variance  of  /(  ZlHi)  is  also  a^.  The  PDFs  for  /(  Z)  under  both  hypotheses  are 


P(/IHo)=>N(0,oJ),  (36a) 

P(/IHi)=»N(a5,  oj).  (36b) 

The  LRT  of  equation  (30)  can  be  rewritten  as 
Ho 

AZ)  >  eLRT.  (37a) 

Hi 

where  the  threshold 

eLRT  =  InX  +  ^  M  j .  (37b) 


The  probability  of  a  false  alarm  Pp  is  die  probability  diat  the  test  statistic  /(2)  exceeds  die 
specified  threshold  e^RT  when  the  correct  hypothesis  is  Thus,  Pp  is  the  area  under  P(/IHo) 
for/>e,  or 

oo 

Pf=  J  P(/IHo)d/‘,  (38) 

ELRT 

or,  since  P(/IHo)d/=>  N(0,  o^), 

Pp  =  1  -  erf  [CLRiyo/l ,  (39) 

where  erf  (* )  is  the  standard  error  function, 

X 

crf[x]  =  f  e'*^  dx  .  (40) 


14 


In  an  analogous  manner,  the  probability  of  detecting  die  signal  is  the  probability  diat  the 
test  statistic  is  greater  than  the  threshold,  given  that  the  correct  hypothesis  is  Hj.  Thus, 


OO 

Pd=  J  P(/IHi)d/,  (41) 

%JIT 

which  for  the  Gaussian  case  is 


P[)  =  1  -  erf  [(CijiT 


(42) 


Figure  2  illustrates  the  Pp  and  Pq  of  equations  (39)  and  (42). 

Given  a  desired  the  corresponding  tfare^old  can  be  obtained  from  a  table  or  computer 

2 

program.  The  for  a  given  e  is  determined  by  die  system  parameter  a  which  is  a  function  of 

2 

signal  design  (M|)  and  noise  V.  Because  o  ^  is  a  measure  of  signal-to-noise  ratio  (SNR),  the 
perfoimance  of  a  hypothesis  test  as  a  signal  detector  is  often  characterized  as  Pd  vs  Pp  for  a  givoi 
SNR-  TTiese  curves  are  shown  in  figure  3  and  are  called  receiver  operating  characteristic  (ROC) 
curves  (reference  3).  Although  the  specific  case  oi  die  Gaussian  test  statistic  is  treated  here,  as 
given  in  equations  (39)  and  (42)  and  illustrated  in  figures  2  and  3,  the  results  of  (38)  and  (41)  hold 

for  arbitrary  densities  of  /(Z). 

A  case  of  special  interest  is  examined  below.  Presendy,  the  aspect  of  stale  estimation  error 
is  neglected  resulting  in  a  simple  signal  detection  problem.  In  all  cases  the  predicted  residual 
covariance  is  V  =  a^I.  For  the  case  of  interest,  under  the  two  hypotheses 

Ho:  Z  =  Nk, 

Hi:  Z  =  Mi.^l+NK.  i^O, 

where  Z  is  the  predicted  residual,  which  widi  5x  s  0  implies  V  =  o^I,  and  the  signal  is 


15 


Mi+i=ai 


m-  l)T]i 


(43) 


where  j  is  the  time  index,  T  is  the  sampling  period,  and  "a”  is  a  known  coefficient  Substituting 

equation  (43)  into  the  LRT  of  equation  (30)  yields 


V^n 


j=l 


J  Hi 


j=l 


Normalizing  equation  (44)  by  the  tenn 
aili 


l^ni  ~ 


where 


K 


k2i=  Xj  ^ 

results  in  a  test  statistic  for  the  i^  feature  defined  as 


4(^  = 


-1  ^ 

(onV^)  X(j  -  1)*  ^ 
j=l 


'  <  InX  J. 

>  kni‘^2^“- 
VHi 


(44) 


(45a) 


(45b) 


(46) 


The  new  test  statistic  4  ( ^  is  a  univaiiant  Gaussian  random  variable.  Consequently,  the 
test  statistic  PDFs  under  the  two  hypotheses  are 


17 


P(4IHo)=>N(0,l), 

(47a) 

P(/ilHi)=>N(kni,l). 

(47b) 

Note  that  the  normalizing  factor  of  equation  (45a)  is  the  square-root  of  the  last  term  on  the  right- 
hand  side  of  equation  (44)  and  is  the  ratio  of  signal  amplitude  to  effective  noise  standard  deviation 
of  the  smoothing  interval.  The  cases  of  interest  are  for  i  =  {0, 1,2, }.  In  these  cases,  the  test 
statistics  are 


^  K 

<)(Z)  =  [onV^ 

j=l 

^  K 

/l(Z)  =  [on>/K(K-  1)(2K-  l)/6] 

j=l 


(48a) 


(48b) 


1  JE. 

/^  ( Z)  =  [ohVK(2K  -  1)(3K2  -  3K  -  1)(K  -  l)/30  ]  X  (i  * 

j=l 


and  the  normalizing  factors  (mean  of  (47b)  as  defined  in  (4Sa))  are 

u _ 

“""on/VK  ’ 

t  ^ - _ 

~  On/VK(K  -  1)(2K  -  l)/6  ’ 


lcn2  = 


-  --  -  az’g  - 

(yVK(K  -  1)(2K  -  1)(3K2  -  3K  -  l)/30 


(49a) 

(49b) 

(49c) 


These  three  cases  are  examples  of  a  signal  (hat  exhibits  the  features  of  a  step  (jtunp),  ramp  (drift), 
and  simple  quadratic  (curvature). 


18 


3.2  GENERALIZED  LIKELIHOOD  RATIO  TEST  (GLRT) 


In  the  previous  case  of  the  LRT,  the  signal  to  be  detected  was  completely  known. 

However,  for  some  signal  detection  problems,  a  parameter  of  the  signal  (such  as  signal  amplitude) 
is  unknown  and  may  vary  over  a  set  of  values.  This  type  of  detection  problem  is  referred  to  as  a 
composite  hypothesis.  A  hypothetical  test  can  be  constructed  for  the  composite  hypothesis  using 
the  correct  (but  unknown)  value  of  the  signal  parameter  in  the  design  of  the  optimal  LRT.  This  test 
is  an  upperbound  on  the  performance  of  any  other  test  A  uniformly  most  powerful  (UMP)  test 
has  a  Pj)  that  is  greater  than  or  equal  to  any  other  test  for  a  given  Pp.  Thus,  if  a  test's  actual 

performance  achieves  the  bound  obtained  by  the  LRT  using  the  correct  signal  parameter,  then  it  is  a 
UMP  test  For  a  UMP  test  to  exist  it  is  both  necessary  and  sufficient  that  a  likelihood  ratio  test  for 
every  value  of  the  signal  parameter  can  be  constructed  without  knowledge  of  the  signal  parameter 
(reference  3). 


Consider  the  polynomial  case  of  interest  from  die  preceding  section.  TTie  LRT  is  given  by 
equation  (44),  which  is  rewritten  here  in  a  slightly  different  form: 


Ho„  2 


j=l 


Hi 


aiP’ 


ai>0, 


(50) 


Here,  the  scaling  terms  on  /( ^  that  rendered  it  univariant  have  been  moved  to  the  odier  side  of  the 
equation.  In  equation  (50),  the  unknown  signal  parameter  a^  has  been  assumed  to  be  positive. 
Clearly,  if  ai  is  negative  then  the  inequality  signs  must  be  reversed.  Thus,  although  the  magnitude 
of  aj  is  not  required  to  construct  the  LRT,  its  sign  must  be  known  to  properly  construct  the  test 
Therefore,  the  LRT  cannot  be  implemented  without  knowledge  of  the  polarity  of  the  signal 
parameter,  and  since  the  polarity  of  the  feature  parameters  that  may  be  present  in  a  residual 
sequence  carmot  be  predicted,  a  UMP  test  does  not  exist  for  this  problem.  Consequendy,  an 
alternative  to  the  UMP  test  must  be  used. 

An  alternate  ^proach  is  to  estimate  the  value  of  the  agnal  parameter  and  use  the  estimated 
parameter  in  the  LRT.  This  is  the  generalized  likelihood  ratio  test  (GLRT).  The  signal  model  is 
defined  by  equations  (8)  and  (9)  for  Hq  and  by  equations  (13)  and  (15)  for  Hp  The  unknown 

signal  parameters  m  are  defined  in  equation  (13b).  Note  that  hypothesis  Hq  is  just  the  special 

case  that  m  has  the  specific  value  of  zero.  The  maximum  likelihood  estimate  of  m  is 

S=  [bTv-1b]’^BTV-1  Z,  (51) 


J 


19 


and  the  estimate  of  the  predicted  residual  mean  M  is 


M  =  B  m. 


(52) 


Substituting  M  for  Mj  of  equation  (29)  (the  subscript  is  being  suppressed)  yields 

A  Ho 

MTv-1Z- J  InX.  (53) 

Hi 

Substituting  equation  (51)  into  equation  (52)  and  using  the  result  in  equation  (53)  yields,  after 
some  manipulation. 


Ho 

/(Z)=  ZTWZ  J  InA.^,  (54) 

Hi 

where  W  =  V-1B[bTv-  1B]-1bTv-1.  Note  that  the  test  statistic  is  a  weighted  norm  of  Zand  is 
positive;  thus,  X.  must  be  greater  than  1  for  a  meaningful  test 

Inserting  I  =  [bTv1b]’^(bTv1b]  into  the  weight  factoring,  and  using  equation  (51) 
reduces  the  GLRT  of  equation  (54)  to 

/( Z)  =  £T[bTv-1b]  a  =  mTv-1  M.  (55) 

A  A 

The  estimate  of  the  polynomial  coefiticients  m  is  also  a  Gaussian  random  variable  with  mean  m 
and  covariance  [BTy^B];  consequently, 

P  ( m)  =>  N(  a,  [BTv-1B]).  (56) 


20 


Because  the  covariance  of  M  is  symmetric  positive  definite,  it  can  be  factoied  as 


letting 


bTv-1b=DTD; 


r  =  D  in 


makes  the  GLRT  become 


wlwre 


Ho 

/(Z)  =  rTr  ^  lnX2, 
Hi 


P(r)  =»  N(D  m,I) 


The  test  statistic  is  a  sum  of  nonzero-mean,  univariant,  squared,  Gaussian  random  variables; 

therefore,  P[/(  Z)]  is  a  noncentral,  chi-squared  distribution  widi  the  number  of  degrees  of 
fieedom  equal  to  die  number  of  polynomial  coefficients  and  a  parameter  of  noncentiality  of 
(references 
4-6): 

C=  mTv-1m  =  /(Z).  (61) 

A 

Consider  the  example  of  special  interest  introduced  previously.  Substituting  for  M  from 
equation  (43) 


Mi+i=ai  [a-l)T]i  ; 


and  realizing  that  forthiscase  V  =  <t^  yields  the  test  statistic 


<  k2i  lj=l 


l)i“^  J  InXZ 

J  Hi 


This  form  can  be  simplified  so  as  to  avoid  evaluation  of  the  noncentral,  chi-squared  density. 


21 


Because  X,  >  1,  die  right-hand  side  of  die  inequality  in  (63)  is  positive;  hence,  the  square 
root  of  the  test  statistic  can  be  used,  resulting  in 


K 

OhV^  pi 


(j  -  l)i  ^ 


Ho 

J  [lnX2] 

Hi 


1/2 


(64) 


Let 


K 

ri=  ]— 

j=l 

(65) 

P(rilHo)  N[0,1], 

(66a) 

P(rilHi)=>N[kni,l]. 

(66b) 

Consequendy,  the  test  can  be  restated  as 
Ho 

lill  J  [lnX2]  =  eoLRT- 
Hi 

Hgure  4  illustrates  die  nonnal  density  of  r  under  the  two  hypodieses.  Because  the  test 
statistic  is  the  absolute  value  of  r,  the  evaluation  of  and  must  include  die  areas  under  the 
density  function  for  both  positive  and  negative  values  of  r  that  exceed  the  diresbold;  hence. 


22 


p^(r|Ho|> 


Figure  4.  Binary  GLUT 


(67a) 


-EGLRT 

Pf=  J  P(rilHo)dri+  J  P(riiHo)dri. 

CGLRT 

I>F  =  2(1  -  erf  [eglRTI)’ 


-£GLRT 

PD=  j  P(rilHi)dri+  J  P(rilHi)drj,  (68a) 

EGLRT 

PD  =  eif[-eGLRT-kni]  +  l  -erf [eoLRT-kni]-  (68b) 


Note  that  the  shaded  areas  of  the  density  models  in  figure  4  are  probability  masses  of  the 
corresponding  single-degtee-of-fieedom,  chi-squared  and  noncentral,  chi-squared  random 

variables  of  r  ?  given  Hq  and  r  ?  given  H  j ,  respectively.  For  a  given  false  alarm  probability  the 

LRT  detection  probabiliQr  is  greater  dian  the  GLRT  detection  probability.  A  plot  of  the  ROC  for 
the  GLRT  is  given  in  figure  S.  Although  not  ^parent  in  the  figure,  the  LRT  ouqrerforms  the 
GLRT  Yty  a  maximum  of  approximately  14  percent  in  probability  of  detectitxi. 


Figures.  GIFT  Receiver  Operating  Characteristic 


24 


3.3  ARTIFICIAL  NEURAL  NETWORK  (ANN)  HYPOTHESIS  TEST 


Another  approach  to  hypothesis  testing  is  to  employ  an  anificial  neural  network  (ANN)  that 
emulates  the  hypothesis  testing  algorithm.  An  ANN  is  an  interconnected  network  of  nodes  and 
branches.  The  branches  are  weighted  and  serve  as  both  inputs  and  outputs  for  the  nodes.  The 
nodes  sum  the  weighted  inputs  and  provide  a  single  output  that  is  a  function  (usually  nonlinear)  of 
the  sum  of  the  inputs.  Typical  architectures  use  layers  of  nodes  with  the  output  of  each  node  of 
one  layer  providmg  an  input  to  every  node  of  the  next  layer.  Figure  6  illustrates  a  single  node  with 
six  inputs  and  a  sigmoid  output  function.  A  typical  feedforward  architecture  with  two  hidden 
layers  is  illustrated  in  figure  7.  Note  that  the  output  layer  need  not  be  a  single  node,  and  that  in 
general  the  number  of  hidden  layers,  the  number  of  nodes  in  each  layer,  as  well  as  the  types  of 
function(s)  to  be  used  are  part  of  the  "art"  of  designing  an  ANN.  Given  a  set  of  input  n-tuples  and 
corresponding  p-tuple  outputs,  the  weights  (wj)  and  biases  (bj)  that  provide  the  best  fit  to  the 

input-output  map  can  be  determined.  The  process  of  finding  the  weights  and  biases  is  termed 
"training,"  and  the  data  set  on  which  training  is  performed  is  called  the  "training  set." 


Figure  6.  Typical  Single  Neuron 


25 


Figure  7.  Typical  Feeiffbrward  Network  with  Two  Hidden  Lexers 


3.4  TRAINING  THE  ARTIFICIAL  NEURAL  NETWORK 

The  function  used  in  the  nodes  will  determine  the  procedure  needed  to  train  the  neural 
network.  Here  a  sigmoid,  which  is  a  smooth,  continuous,  nonlinear  function  is  used.  These 
characteristics  allow  standard  nonlinear  search  techniques  to  be  employed  in  training. 

Training  also  entails  having  an  input/ouqtut  relationship  for  the  ANN  to  emulate.  In  this 
case  the  netwoik  will  be  required  to  produce  a  function  of  the  test  statistic  of  equation  (48), 

/{( 7^,  when  Z  is  given  as  an  input  The  sigmoid  function  defined  in  figure  6  has  an  ouq>ut  in  the 
range  [0,1]  as  shown  in  figure  8. 

The  function  of  the  test  statistic  is  defined  as  follows.  For  a  feature  made  with  a  positive 
coefficient  the  required  output  function  yi(  %  is  defined  with  the  sigmoid  function  as 

{ 1.0  +  exp  [-oc/i(  Z)]  }~1,  where  a  is  chosen  so  that  the  ouqtut  achieved  the  specific  value  of  a  2- 
percent  probability  of  false  alarm  at  the  output  levels  of  O.S  ±0.1.  Similarly,  if  the  coefficient  is 

negative,  yjC  Z)  is  defined  as  { 1.0  +  exp  [a/{{  }"1.  Note  tiiat  these  two  functions  give  a 

continuous  output  from  zero  to  one  as  the  feature  parameter  a^  goes  from  ntinus  infinity  to  plus 

infinity,  that  /[( %  can  be  readily  computed  by  tite  inverse  mtq)ping  of  the  ou^ut  function  yj,  and 
the  polarity  of  the  feature  is  indicated. 

Infieen  data  points  were  detennined  to  be  an  appropriate  tradeoff  between  a  longer 
integration  time  for  better  feature  resolution  and  (tetection,  and  a  shorter  record  length  to  limit  the 
number  of  features  likely  to  occur  in  a  given  sequence.  Longer  sequences  can  be  accommodated 
by  sliding  the  data  window. 

The  software  used  to  train  the  netwoik  also  required  the  inputs  to  be  non-negative.  To 
overcome  this  and  for  ease  of  training  the  netwoik  to  recognize  features,  the  inputs  to  the  netwoik 
were  scaled  and  shifted  as 

^(i)=z(i)  ^^  +  0.5,  (69) 

where  the  scale  factor  of  0.0125  was  arbitrarily  selected  to  yield  a  dynamic  range  of  80 
measurement  noise  standard  deviations  for  tite  combined  waveforms. 

The  ANN  was  trained  as  follows.  Given  residual  vectors  and  their  associated  output 
functions,  compare  the  network's  outputs  to  the  required  output  functions  and  adjust  the  weights 
and  biases  to  minimize  the  error.  This  procedure  is  illustrated  in  figure  9.  While  the  training  rule 
used  to  adjust  die  weights  and  biases  can  be  any  nonlinear  search  technique,  a  quasi-Newton 
technique  was  employed  here. 


27 


Figure  8.  Illustration  of  the  Sigmoid  Function 


AOJUSTMBIT 


Figure  9.  Illustration  of  Network  Training 


4.  MULTIPLE  HYPOTHESIS  TEST 


In  section  3  the  problem  of  detecting  a  signal  with  known  stracture  in  the  presence  of  noise 
was  considered.  Becanse  that  test  considers  one  of  two  possible  outcomes  (either  the  signal  is 
present  or  it  is  not),  it  is  often  called  a  binary  hypotiiesis  test  This  section  discusses  the  multiple 
hypothesis  tests. 


4.1  LIKELIHOOD  RATIO  TEST 

The  anomalous  feature  model  was  defined  by  equations  (12)  and  (13)  as  a  (g  - 1)^  order 
polynomial.  In  the  binary  hypothesis  test  of  section  3,  the  signal  detection  problem  was 
considered.  In  that  case  the  signal  structure  was  known;  that  is,  polynomial  order  was  known  as 
well  as  which  coefficients  were  nonzero.  Here,  only  the  maximum  model  order  is  known,  and  a 
test  is  devised  to  determine  which  polynonual  coefficients  are  nonzero.  This  is  a  standard  problem 
in  regression  analysis  and  the  development  presented  here  has  been  adapted  firom  references  6  and 
7. 


The  case  considered  here  is  a  maximum  polynomial  order  of  one  for  the  anomalous  feature; 

that  is, 

m  =  [ao,ai]T.  (70) 

The  hypotheses  are 

Hq:  [ao  =  0,  ai  =  0]  no  anomaly  present  -  noise  only. 

Hi:  [ao  ^  0,  ai  =  0]  jump  anomaly  only, 

H2:  [do  =  0,  ai  0]  drift  anomaly  only, 

H3:  [sq  ^  0,  ai  ^  0]  jump  and  drift  anomalies. 

The  development  is  easily  extended  to  arbitrary  model  order  by  noting  that  tire  set  of  polynomial 
coefficients  results  in  2  ^  hypotheses. 

Consider  the  special  case  of  uncorrelated  error  in  the  predicted  residual  sequence.  This 
corresponds  to  the  case  of  negligible  estimation  error.  In  diis  case  the  model  for  the  predicted 
residual  of  equation  (13)  is 

Z  =  Nk  +  M.  (71) 

Because  the  measurement  noise  vector  is  an  independent,  zero-mean,  Gaussian  random  vectOT, 
it  follows  that 


P(Z)=:>N[M,oJl]. 

The  joint  density  function  of  the  predicted  residuals,  conditioned  on  the  hypothesis  Hj  that 
the  i^  anomalous  feature  set  is  present  in  the  sequence  (i.e.,  jump-only,  drift-only,  jump-drift, 
etc.),  is 

P(ZIHi)  =^»N[Mi,(ynI]. 

where  Mj  is  the  corresponding  mean  vector  induced  by  the  anomaly  of  the  i^  hypodiesis.  The 
likelihood  ratio  test  between  any  two  hypotheses  Hi  and  Hj  is 

LRT(Z)=  J  X.  (72) 

P(^j)  _ 

Hi 

Here,  if  the  LRT  exceeds  the  threshold  X,  then  hypothesis  Hj  is  rejected.  Likewise,  if  the  LRT  is 
below  the  threshold,  then  hypothesis  Hi  is  rejected. 

The  case  where  die  measurement  noise  standard  deviation  <%  is  known  and  the  anomaly 

M  =  [mj, ...,  miq]*^  is  unknown  is  considered  first  Substituting  for  P(  ^H)  for  the  general  case 
results  in 

[(2it)K/2  det  exp  [-  i  (Z  -  Mi)Tv-l(Z  -  Mi)]  ^ 

LRT(Z)  = - i - T -  Z  (73) 

[(2tc)K/2  det  IV|l/2]-^  exp  [-  5  (Z  -  Mj)TV-l(Z  -  Mj)] 

Hi 

Taking  the  natural  logarithm  yields 

Hj 

ln[LRT(  Z)]  =  -  5  II Z  -  Mil^.i  +  ^  IE  -  J  InX.  (74) 

Hi 


30 


For  the  case  of  interest,  V  =  a  ^  and,  because  M  is  unknown,  the  estimated  M  of  equations  (SI) 
and  (52)  is  used,  resulting  in 


A 

M  =  B 


A 

m. 


(75a) 


m  =  [BTb]  '  bTz, 


(75b) 


which  upon  substituting  in  (74)  yields  the  GLRT: 

ln[GLRT(^]= - 2^- - ’ 


or 


'n 


K 

kii 


H; 


Hi 


~  ^  2  ir  ~'^2  j 

(zk  •  mjk)  I  (zj  -  mjk)  >  j^2 

^  On  : 

Hi 


(76a) 


(76b) 


where  mjii'  is  the  k^  component  of  the  vector  Mj.  Define  the  test  statistic  for  tiie  multiple 
hypotiiesis  generalized  likelihood  ratio  test  as 


4n(Z)  =  ln[GLRT(Z)]. 


(76c) 


To  obtain  quantifiable  results  for  die  GLRT,  it  is  necessary  to  know  the  {nobability  density 
function  (PDF)  of  the  test  statistic  4n*  Tlie  remainder  of  diis  section  is  devoted  to  ^  development 

of  the  test  statistic's  PDF. 


From  equation  (13),  the  mean  vector  M  can  be  decomposed  into  a  linear  combination  of 
columns  of  tiie  matrix  B;  that  is. 


M  =  aoBi  +  aiB2  + ...  +  aq-jBq, 


(77a) 


where  Bf  are  the  columns  of  B: 
B  =  (Bjl ...  I  Bq] , 


(77b) 


31 


and 


(77c) 


The  vectors  (Bi)  are  independent  and  therefore  form  the  basis  for  a  q-dimensional  subspace  of 

the  K-dimensional  space  of  the  predicted  residual  vector.  Let  the  hypothesis  be  die  anomaly 
modeled  by  the  q-1™  order  polynomial  with  all  coefficients  present  (H3  for  the  case  of  interest). 
The  other  hypotheses  are  developed  as  constraints  on  the  coefficients;  that  is, 

Cin  =  0.  (78) 

For  die  case  of  interest  here,  the  hypothesis  Hj  results  in  the  following  constraint  matrices  C: 

Ho-  Co  =  I2x2.  Hi;Ci  =  [0,1],  H2:C2  =  [LO], 

where  I2x2  ^  ^  identity  matrix.  When  hypotheds  H|(i  ^  j)  is  true,  p-constraints  are  imposed 
on  the  coefficients  ai  that  can  be  used  to  reduce  the  number  of  independent  terms  in  the 
decomposition  of  M  in  equation  (77).  Consequendy,  if  Hj  =  H^,  then  a^  =  0,  and 

Mi=aoBi,  (79) 

where  die  rank  of  C  equals  p  and  the  space  spanned  by  the  remaining  columns  of  B  has  been 
reduced  by  one.  In  generaL  if  p  constraints  are  imposed  by  the  i^  hypothesis,  then 

^i“®o  ^'P-1  ®9“P’ 

t 

where  a  •  are  the  surviving  coefficients  and  6|  are  the  ba^  vectors  of  (q  -  p)-dimensional  subspace 
spanned  by  M  under  H^. 

Let  the  set  of  Kxl  vectors  {q}  be  an  oithonormal  basis  for  die  K-dimensional  space  of 
the  predicted  residuals.  Specifically,  let  the  fust  (q  -  p)  vectors  {q}  ^  be  the  basis  for  die  space 

spanned  by  the  constrained  columns  of  B,  Le.,  and,  further,  let  die  set  of  vectors  {q} 

span  the  column  space  of  B;  then. 


32 


Z  =  Rfl.  (81a) 

and 

M  =  R:^,  (81b) 

where 

R  =  [rl ...  (81c) 


and  Ol  and  :)f  are  Kxl  vectors  of  the  coefficients  of  Z  and  M,  respectively,  widi  respect  to  the 

K 

orthononnal  basis  vectors  {rj}  and 


IS 

II 

8 

k 

H 

(81d) 

:)(=[Yl.  .,Yq-p.  0 . 0]T 

(81e) 

Substituting  equation  (81)  into  the  two  forms  of  equation  (74),  under  the  condition  that  test 
hypothesis  Hji  is  true,  results  in 


llZ-Mj!l2=  (aic-Yk)^  +  2 

k=l  k=q+l 


and 

IIZ-Mil|2=  XCak-Tk)^+  2  “k^* 

k=l  lc=q-p4-l 


(82a) 


(82b) 


For  the  GLRT,  M  is  not  available  and  must  be  estimated.  The  estimate  is  given  by  equation  (75), 
which  for  die  case  of  interest  here  (V  =  a^I)  is  equivalendy  die  least-squares  estimate  or  the 
maximum  likelihood  estimate.  Because  the  estimate  of  M  must  minimiTy.  die  sum  of  die  squared 
error  in  die  fit  of  the  mean  M  to  die  predicted  residuals  Z,  it  follows  that  M  must  minimirf. 
equation  (82).  From  the  right-hand  side  of  equation  (82),  the  least-squares  estimate  of  ^  =  Ryis 


A 

Hi:  'yk  =  otk; 

k  =  1, ....  q  -  p. 

(83a) 

A 

Hj:  =  Ok ; 

k  If 

(83b) 

33 


Consequently,  equation  (82)  reduces  to 


IZ-  M;I|2=  y  0^2, 
k=q+l 


(84a) 


and 


K 


IIZ-Mil|2=  ajc2. 

k=q-p+l 

Substituting  equation  (84)  into  the  GLRT  of  equation  (76)  and  reducing  yields 

Hi 


(84b) 


I  .  atA  %  to 
0„lk=q-p+l  ) 


(85) 


Hi 


The  predicted  residual  Z  is  a  Gaussian  random  vector  with  mean  Mj,  covariance  <T^,  and 
PDF  given  by 


P(2)  =>N(Mi,oJl). 


(86) 


From  equation  (82),  the  PDF  of  the  random  vectt>r  fl  is 

P(a)  =  [detIR‘l|]-lp(R-l  Z).  (87) 

Because  R  is  a  linear  orthogonal  transformation  (R'^  =  R^  and  detIRI  s  1),  it  follows  that  the 
random  vector  fl  is  also  Gaussian  widi  PDF 

P(a)  NC^OnD.  (88) 

Clonsequendy,  die  GLRT  statistic  4n  is  the  sum  of  die  square  of  p,  zero-mean,  unit-variance, 
independent,  Gaussian  random  variables  and  is  dierefore  a  chi-squared  random  variable  with  p 
degrees  (tank  of  the  constraint  matrix)  of  freedom.  Recall  that  this  result  was  obtained  under  the 
crmditimi  that  Hi  is  true.  Therefore,  Ae  test  statistics  4i  will  exceed  the  threshold  and  reject  Hj 
when  Hi  is  true  with  a  probabiliQr  that  is  given  by  die  chi-squared  distribution  with  p  degrees  of 

freedom.  Similarly,  if  Hj  is  true,  then  at  least  some  of  the  random  variables  {oq^:  k  =  q  -  p,  ...,q} 

are  not  zero  mean,  and  the  test  statistic  becrnnes  a  noncentral,  chi-squared  random  variable  with  p 
degrees  of  freedom.  Under  the  conditions  of  Hj,  the  test  statistic  increases,  thereby  increasing  tte 
probability  diat  die  threshold  will  be  exceeded  and  Hi  rejected. 


34 


The  GLRT  statistic  can  be  represented  in  a  form  that  is  more  convenient  to  use.  Noting 
from  equations  (81)  and  (82)  that 

A  q 

Mj=  X  (89a) 

k=l 


and 

A 

X  “kHr.  (89b) 

k=l 


it  follows  that  the  difference  in  the  estimates  of  the  means  under  hypotheses  j  and  i  is 


A  A  q 

[Mj-Mi]=  X  atrk,  (89c) 

k=q-p+l 

and  the  squared  norm  of  the  difference  of  the  means  is 


!IM;-Mil|2=  y  .  (90) 

k=q-p+l 

Note,  however,  that  the  right-hand  side  of  (90)  is  die  unnormalized  GLRT  statistic  of  equation  85; 
hence. 


or 


II Z-  Mjl|2-||Z-  Mil|2  =  ||Mj-  Mil|2 


^  ~  IIM;-Mil|2 

4n(Z)= - 1-2-^ 


Hi 

^  lnX"2. 


Hi 


(91a) 


(91b) 


That  is,  die  GLRT  statistic  reduces  to  die  squared  norm  of  the  distance  between  the  two  mean 
vectors  estimated  under  the  jOt  and  i^  hypotheses.  Given  that  Hj  is  the  full  order  model  then  ^ 
is  central  chi-squared  with  p  degrees  of  freedom  when  Hi  is  true  and  is  non-ccntial  chi-squared 
with  p  degrees  of  freedom  when  Hj  is  true. 


35 


For  the  specific  case  of  j  =  3,  the  tests  of  interest  are 


A  A  H2 

^  IIM3  -M2l|2  > 

T2.  2  <  ®2» 


H3 


A  A  Hi 

IIM3  -MiI|2  > 

Ti.  2  <  ®1» 


H3 


(92a) 


(92b) 


H3 


(92c) 


Tests  Tj  evaluate  the  statistical  significance  of  the  presence  of  each  coefficient  or  feature  in  the 
polynomial  indicative  of  anomalies  in  the  predicted  residual  sequence.  For  T2  the  test  statistic  is 
the  norm-squared  value  in  units  of  noise  standard  deviation  of  die  difference  between  the  estimates 
of  die  mean  obtained  under  die  hypodiesis  that  jump  and  drift  features  are  present 
(^3  ^  ^  0,  ai  0)  and  the  hypothesis  that  only  drift  is  present  (M2  Sq  =  0,  ai  ^  0).  Thus, 

T2  is  a  test  of  die  significance  of  the  nonzero  estimate  of  the  jump  coefficient  Sq.  the  test  statistic 

is  below  threshold,  the  hypothesis  that  jump  and  drift  are  present  is  rejected  in  favor  of  die 
hypothesis  that  only  a  drift  feature  is  present,  Le.,  no  significant  jump  feature  is  evident 
Conversely,  if  die  test  statistic  exceeds  die  thre^old,  die  hypothesis  that  only  a  drift  is  present 
(H2)  is  rejected  in  favor  of  the  hypothesis  that  bodi  jump  and  drift  features  (H3)  are  evident  in  the 

sequence. 

When  H2  is  true,  which  implies  ag  is  zero,  the  test  statistic  is  central  chi-squared  with  one 

degree  of  freedom.  Consequendy,  die  direshold  £2  can  be  established  to  give  a  desired  probability 
of  false  detection  of  the  jump  feature.  When  H3  is  true,  a  jump  is  present  and  the  test  statistic  is 


36 


noiu^ntral  chi-squared  with  one  degree  of  freedom.  Although  the  value  of  the  jump  coefficient  a^ 
is  unknown,  the  parameter  of  noncentrality  could  be  computed  for  some  value  of  slq  and  the 

threshold  E2  set  to  achieve  a  specified  probability  of  jump  detection  at  a  given  amplitude. 

However,  this  approach  was  not  used.  Notice  that  rejection  of  hypothesis  H3  by  test  T2  does  not 
imply  that  the  drift  hypothesis  is  accepted.  The  only  conclusion  that  can  be  drawn  ftom  test  T2  is 
whether  or  not  jump  is  present  Similar  reasoning  appUes  for  test  T 1,  which  determines  if  drift  is 
present  The  T3  test  determines  if  the  combined  hypothesis  of  jump  plus  drift  is  to  be 
accepted/rejected  over  the  null  hypothesis  of  no  anomaly  (Hq).  This  test  statistic  is  central  chi- 
squared  with  two  degrees  of  freedom  when  Hq  is  true.  (Test  Tq  is  redundant  with  tests  T 1  and  T2 
and  is  not  used.) 

To  complete  the  hypothesis  test,  a  logic  tree  is  constructed  fitim  the  possible  outcomes 
listed  in  table  1. 


Table  1.  Multiple  Hypothesis  Test 


If  the  result  of  Ti  indicates  the  presence  of  drift  (rejea  ai  0)  and  that  of  T2  indicates 

the  absence  of  a  jump  (reject  H3  ^  a^  =  0),  the  logical  conclusion  is  that  the  predicted  residuals 
exhibit  a  drift-only  feature  (H2). 


4.2  MODIFIED  NEYMAN-PEARSON  TEST 

Multiple  feature  detection  can  also  be  performed  as  a  sequence  of  independent  binary 
hypothesis  tests  with  the  aid  of  an  od  hoc  modification  to  the  predicted  residuals.  Widi  this 

A 

approach,  an  estimate  for  the  feature,  m  is  obtained  under  the  hypothesis  that  all  the  features  are 
present,  H3  in  this  case.  New  data  sequences  are  formed  that  contain  a  single  feature  by 

subtracting  the  other  features;  that  is. 


37 


(93a) 


Zi=  2-M2, 


Z2=  Z-Mi, 

where 

A  ^ 

Ml  =BEim  , 

A  ^ 

M2  =  BE2m  , 


(93b) 

(93c) 

(93d) 


and 

El=[o  o]- 

E2=  [J  ?].  e»3f) 

Here,  Zi  is  a  pseudo-predicted  residual  sequence  that,  in  principle,  has  had  all  the  features 
removed  except  jump.  Similarly,  the  sequence  Z2  is  devoid  of  all  features  except  drift 
Previously,  it  was  shown  that  the  feature  model  of  H3  will  ^ve  a  better  fit  to  Z  dian  die  odm 

feature  models  containing  fewer  coefficients.  Thus,  in  some  sense,  Zi  and  Z2  represent  separate 
observations  of  die  individual  features,  vi^ch  allows  for  independent  binary  tests  on  each  feature 

~  A 

as  in  equation  (31).  However,  when  using  Zi  and  Mi  in  equaticm  (31)  it  must  be  noted  that  the 
results  of  each  binary  test  are  obtained  assuming  the  jnesence  of  all  the  odier  features.  That  is, 
jump  exists  (does  not  exist)  given  that  a  drift  is  presoit,  etc.  Recall  diat  equation  (31)  is  die  test 
statistic  for  the  binaty  hypothesis  general  likelihood  ratio  test  Because  die  LRT  is  derived  fiom 
the  Neyman-Pearson  test  criteria,  this  technique  is  called  the  modified  Neyman-Pearson  (MNP) 

A 

test  Note  diat  only  one  set  of  coefficient  estimates  m  is  required,  which  is  a  significant 
computational  savings  over  the  chi-squared  test  However,  the  {xobability  density  function  for  d^ 
MNP  test  statistic  was  not  determined.  Consequendy,  die  thresholds  needed  to  give  the  required 
probabilities  of  false  alarm  and  detection  are  not  determined. 


4.3  ARTinCIAL  NEURAL  NETWORK 

The  same  ANN  previously  described  for  the  binary  hypothesis  test  was  used  fw  the 
detection  of  multiple  features.  Ihe  only  difference  was  that  the  multiple  hypothesis  ANN  was 
constructed  widi  two  ouqiut  nodes  versus  the  single  ouqiut  node  for  the  birmty  ANN.  The 
network:  was  trained  to  emulate  the  likelihood  functions  produced  by  the  MNP  test  The  MNP 
results  were  selected  for  training  because  of  the  MNP  algoridun's  computational  efficiency  and,  as 
will  be  discussed  later,  the  fact  that  its  performance  was  neariy  identical  to  diat  of  die  chi-squared 
test 


38 


5.  EXPERIMENTAL  RESULTS  AND  DISCUSSION 


The  first  phase  in  the  process  of  evaluating  the  neural  network's  performance  involved 
training  an  ANN  to  emulate  the  LRT  binary  hypothesis  test  of  detecting  a  jump  in  the  presence  of 
noise.  (The  training  method  was  described  earlier  in  section  3.4.)  In  this  phase,  a  network  with  a 
single  output  was  used.  ROC  curves  were  produr^d  using  synthetic  data  sequences  generated  by 
adding  Gaussian  white  noise,  with  zero  mean  and  unit  variance,  to  jumps  with  various  amplitudes, 
providing  a  range  of  signal-to-noise  ratio  (SNR)  values,  and  comparing  the  resulting  output  of  the 
ANN  with  a  range  of  thresholds.  Here,  SNR  is  defined  as 

SNR=  VmTv-1m, 


where  V  is  the  covariance  matrix  o^.  The  results  were  accumulated  for  10,000  Monte  Carlo  trials 

for  each  of  the  SNR  levels  shown  in  figure  3,  which  is  a  plot  of  P(HilHi,  e)  for  varying  e,  and 
specific  jump  amplitudes  as  a  parameter.  It  can  be  seen  (figures  10  and  11)  that  the  ANN  matched 
the  theoretical  predictions  of  a  GLRT.  These  results  show  that  it  is  possible  to  train  an  ANN  to 
efficiently  emulate  a  Neyman-Pearson  test  procedure. 

Experiments  were  then  conducted  to  compare  tiie  ANN’S  performance  as  a  feature  detector 
and  discriminator  against  the  XSQ  and  MNP  techniques  when  multiple  features  are  present  Here, 
a  synthetic  data  sequence  was  generated  as  previously  described,  except  that  the  signal  levels  were 
defined  by 


SL(aQ)  —  (  , 


and 


SL(ai)  =  al 


V2K3  -  3K2  +  K 
6on 


The  results  were  accumulated  as  follows.  Figures  12a  and  12b  show  the  overall  probability  of 
detecting  a  jump  feature  given  that  a  jump  is  present  (regardless  of  the  presence  of  drift)  versus  the 
overall  probability  of  detecting  a  jump  given  that  no  jump  is  present  (again  regardless  of  drift)  for 
the  XSQ  test  and  the  ANN,  respectively.  That  is,  [P(HilHi,e)  +  P(HilH3,e)]  is  plotted  versus 
[P(HilHo,e)  +  P(HilH2,e)l.  As  can  be  seen  from  the  figures,  the  ANN  somewhat  outyerforms 

the  XSQ  test  for  low-  to  mid-range  false  alarm  probabilities  but  is  subsequently  outperformed  at 
higher  false  alarm  probabilities.  The  results  for  the  MNP  were  nearly  identical  to  those  for  the 
XSQ  test  and  are  not  presented.  A  similar  relationship  (not  shown)  was  obtained  for  detection  of 
drift  by  the  three  techniques  with  similar  results. 


Figures  13a-d  and  14a-d  show  the  components  tiiat  make  up  the  overall  curves  of  figures 
12a  and  12b,  respectively.  Figures  13a  and  14a  show  (for  the  XSQ  test  and  ANN,  respectively) 


39 


the  probability  of  detecting  a  jump  when  only  drift  is  present  versus  the  probability  of  detecting  a 
jump  when  only  noise  is  present  (a  false  alarm),  i.e.,  P(HilH2,e)  is  plotted  versus  P(HilHo,e)- 
Figures  13b  and  14b  show  the  probability  of  detecting  a  jump  given  that  a  jump  occurred  versus 
the  false  alarm  probability,  i.e.,  P(HilHi,e)  vs  P(HilHo,e).  Figures  13c  and  14c  show  the 
probability  of  detecting  a  jump  in  the  presence  of  an  interfering  drift  with  the  same  polarity  versus 
the  false  alarm  probability,  i.e.,  P(HilH3,e)  vs  P(HilHo,e).  Finally,  figures  13d  and  14d  are 
constructed  in  the  same  manner  as  13c  and  14c  except  that  the  features  have  opposite  polarity. 

Figures  13a-d  clearly  show  that  the  XSQ  test  for  jump  is  insensitive  to  the  presence  of  an 
interfering  drift.  However,  examination  of  figures  14a-d  shows  that  the  ANN  has  an  unwanted 
sensitivity  to  the  interfering  signal.  This  is  especially  evident  in  a  comparison  of  figures  14c  and 
14d.  Here,  a  drift  with  the  same  polarity  aids  detection  of  a  jump,  while  a  drift  with  opposite 
polarity  inhibits  detection.  Similar  results  were  obtained  for  drift  and,  once  more,  the  MNP  results 
were  nearly  identical  to  the  XSQ  results.  The  difference  in  sensitivity  to  interfering  signals  between 
the  ANN  and  the  other  two  techniques  can  be  seen  by  comparing  figures  13a  and  14a. 

When  one  views  the  overall  curves  of  figure  12,  die  ANN  appears  to  produce  somewhat 
better  performance  than  the  XSQ  test  This  is  the  result  of  two  phenomena.  The  first  is  that  there 
are  more  detections  of  the  desired  signal  when  it  is  present  due  to  die  interfering  signal  (see  figures 
13b  and  14b),  and  the  second  is  a  thresholding  of  the  sensitivity  to  the  interfering  signal  at 

low  false  alarm  probabilities  (see  figure  14a).  For  these  reasons,  care  was  taken  in  comparing 
these  techniques.  This  view  is  consistent  with  die  concerns  brought  out  in  Lau  and  Widrow 
(reference  8).  It  is  felt  that,  because  the  MNP  (which  the  ANN  was  trained  to  emulate)  does  not 
exhibit  this  sensitivity  to  an  interfering  signal,  and  the  ANN  was  successfully  trained  to  emulate  the 
GLRT  in  the  binary  hypothesis  test,  die  sensitivity  of  the  ANN  can  be  overcome  by  a  better 
selection  of  die  trainirig  pattern  set  and/or  a  different  network  architecture.  That  is,  the  ANN  has 
yet  to  fully  learn  (his  input/output  relationship. 


40 


PROBABILITY  OF  DETECTION  PROBABILITY  OF  DETECTION 


PROBABILITY  OF  DETECTION  PROBABILITY  OF  DETECTION 


Figure  13a.  XSQJutr^Ou^Hit  Given  Dr^  Only 


Figure  13b.  XSQ  Jump  Output  Given  Jump  Onfy 


43 


SNR  »  3.0D 


/  SNR  ■  1^0 
/  A**r  . 

/x 


Vx 


JUMP  AND  DRIFT 
POLARITY  AND  SNR 

.  +1JJ  + 1 JD 

-1^  -1JD 
— — —  4-XOD 

— — -1^  -3J)D 
—  •••—  +3JU  +1^D 
— —  -3JU  -I^D 
H-SJU  •f3J)D 
......  -3JM  -3i>0 

— — NOISE  ONLY 


PROBABILITY  OF  FALSE  ALARM 


Figure  13c.  XSQ  Jump  Ouput  Given  Drift  wiA  Same  Polarity 


SNR  -  UD 


8NR  ■  ✓ 

// 


JUMP  AND  DRIFT 
POLARITY  AND  SNR 

..............  -1AI  -t-UD 

-flAI  -14D 
— — — .  -1AI  43JI0 
-flM  -MO 
— 3JJ  +1.50 
'  "  -  +3JM  -1.5D 

mm  iJU  +3jOD 
......  +SJM  -3J)D 

— —  NOISE  ONLY 


PROBABILITY  OF  FALSE  ALARM 


Figure  13d.  XSQ  Jump  Ou^mt  Given  Drift  wiA  Opposite  Polarity 


OJ  (M  OJ  IM 


PROBABILITY  OF  FALSE  ALARM 

Figure  14b.  ANN  Jump  Ou^ut  Given  Jump  Only 


45 


>3XJ+3.0 


6.  SUMMARY  AND  CONCLUSIONS 


The  problem  of  model  assessment  for  nonlinear  systems  with  uncertain  models  involves, 
among  other  things,  detection  and  classification  of  mismodeling  by  observing  residual  sequences. 
Three  methods  for  feature  detection  and  extraction  were  developed  and  their  performance 
investigated.  The  first  method  for  multiple  hypothesis  testing  was  derived  fiom  a  likelihood  ratio 
test  and  resulted  in  a  chi-squared  (XSQ)  test  statistic.  The  second  approach  was  arrived  at  through 
an  ad  hoc  development  that  was  designed  to  produce  a  similar  test  criterion.  This  approach 
resembles  a  modified  Neyman-Pearson  (MNP)  test  and  is  more  computationally  efficient  than  the 
XSQ  test  The  third  method  used  an  artificial  neural  network  (ANN),  which  is  a  nontraditional 
approach  to  hypothesis  testing.  The  ANN  was  trained  to  emulate  the  MNP  test. 

To  facilitate  analysis,  the  neural  network  was  first  trained  and  tested  for  the  binary  signal 
detection  problem.  Its  performance  was  found  to  be  equal  to  the  theoretical  bound  of  a  generalized 
likelihood  ratio  test  For  the  multiple-feature  application,  the  two  classical  techniques  (a  XSQ  test 
and  MNP  test)  were  also  presented  for  comparison.  The  signals  considered  were  combinations  of 
the  different  features  of  jump  and  drift  at  varying  signal  levels.  Performance  of  the  ANN  relative 
to  tiiat  of  the  other  techrtiques  was  analyzed  by  comparing  the  percentage  of  detections  of  various 
feature  combinations. 

The  results  show  that,  although  the  ANN  provides  a  somewhat  better  overall  probability  of 
detection  (Le.,  detection  of  jump  with  and  without  drift)  at  low  probabilities  of  false  alarm  than  tiie 
XSQ  test,  the  apparent  improvement  in  performance  is  tempered  by  the  sensitivities  of  the  ANN  to 
an  interfering  signal  Viewing  the  constituents  of  the  detection  probability  (i.e.,  jump  detection 
without  drift  versus  jump  detection  in  the  presence  of  drift)  shows  that  feature  detection  is 
enhanced  by  the  presence  of  an  interfering  signal  witii  the  same  polarity,  but  it  is  inhibited  by  a 
signal  with  the  opposite  polarity.  This  is  considered  an  undesirable  effect  Hence,  eitl^r  an 
adequate  network  configuration  or  ample  traiiting  of  the  network  was  lacking.  However,  because 
of  the  ANNs  success  witfi  the  binary  hypotiiesis  test  it  is  anticipated  that  further  design 
modification  can  successfully  eliminate  the  undesirable  effects  of  the  interfering  signal,  and  that  the 
ANN’S  performance  can  be  made  to  match  or  exceed  that  of  the  XSQ  and  MNP  techniques. 


47 


REFERENCES 


1 .  J.  Baylog,  A.  A.  MagUaro,  S.M.  Zile,  and  K.F.  Gong,  "Underwater  Tracking  in  the 
Presence  of  Modeling  Uncertainty,"  Proceedings  of  the  Twenty  First  Asilomar  Conference 
on  Signals,  Systems  and  Computers,  November  1987. 

2.  B.  D.  O.  Anderson  and  J.  B.  Moore,  Optimal  Filtering,  Prentice-Hall,  Englewood  Cliffs, 
NJ,  1979. 

3.  H.  L.  Van  Trees,  Detection,  Estimation,  and  Modulation  Theory,  John  WUey  and  Sons, 
New  York,  1968. 

4.  R.  L.  Plackett,  Regression  Analysis,  Oxford  University  Press,  New  York,  1960. 

5 .  C.  R.  Rao,  Linear  Statistical  Inference  and  Its  Applications,  John  Wiley  and  Sons,  New 
York,  1965. 

6.  A.  Stuart  and  J.  K.  Ord,  Kendall's  Advanced  Theory  of  Statistics,  5th  Edition,  VoL  2., 
Oxford  University  Press,  New  York,  1991. 

7.  P.  G.  Hoel,  S.  C.  Port,  and  C.  J.  Stone,  Introduction  to  Statistical  Theory,  Houghton 
Mifflin,  Boston,  MA,  1971. 

8.  C.  G.  Y.  Lau  and  B.  Widrow,  "Neural  Netwoiks  I:  Theory  and  Modeling  (Scanning  the 
Issues),"  Proceedings  of  the  IEEE:  Special  Issue  on  Neural  Networks,  vol.  78,  no.  9, 
September  1990,  pp.  1411-1413. 


48 


INITIAL  DISntlBUTION  LIST 


Addressee 


No.  of  Copies 


Chief  of  Naval  Operations  (OP-224) 

Chief  of  Naval  Research  (OCNR-1271 — J.  Smith, 

OCNR-23 — A.  Faulstich,  OCNR-232 — D.  Houser) 

Naval  Sea  System  Command  (SEA-06UR,  SEA-06UR3,  SEA-06UR4) 

Program  Executive  Officer,  Submarine  Contoat 

and  Weapons  Systems  (PM0-409 — R.  Dosti,  PMO-418) 

Commander,  Submarine  Force  Atlantic  Fleet 

Commander,  Submarine  Force  Pacific  Fleet 

Commander,  Submarine  Development  Squadron  Twelve 

Naval  Command  and  Control  and  Ocean 

Surveillance  Center  (Code  57 — R.  Moore) 

Naval  Postgraduate  School 

Naval  Surface  Warfare  Center  (Code  U04 — M.  Stripling) 
Defense  Advanced  Research  Projects  Agency 
Defense  Technical  Information  Center 
Center  for  Naval  Analyses 


1 

3 

3 

2 

1 

1 

1 

1 

1 

1 

1 

2 

1 


